\section{Lower Bound}
\label{sec:lowerbound}

As demonstrated in the previous section, we need to ensure an error of
at most $O(1/\sqrt{n})$ in the computation of $D$ to be able to
conduct the KS-test with reasonable accuracy. In this section we will
show that at least $\Omega(\sqrt{n})$ information must be retained by
a sketch for estimating $D$ to attain this accuracy. This shows that
our $O(\sqrt{n}\log{n})$ upper bound is nearly tight. 

\begin{figure}[tb]
%\vspace{-4cm}
\centering
\includegraphics[width=.35\textwidth, angle=270]{figs/fig2.eps}
%\vspace{-2cm}
\caption{Lower bound illustration}
\label{fig:lb}
\end{figure}

We show the lower bound below for the two-sample case:
\begin{theorem}
\label{thm:lowerbound}
For arbitrary $n$, any sketch for computing the KS-statistic to within $O(1/\sqrt{n})$
precision in the two-sample test must retain $\Omega(\sqrt{n})$ samples.
\end{theorem}

\begin{proof} 
Fix any $n$. We define a stream (call it $S_1$) as follows. For some
$s < n$, let $\{v_1, \ldots, v_s\}$ be $s$ distinct values in the range
$[0, 1]$. The stream $S_1$ consists of these $s$ values each repeated
$\lfloor n/s\rfloor$ times in some order.

Now, if we assume that the sketch for $S_1$ does not retain all $s$
distinct values, then there is some value $v_i$ that is not tracked by
the sketch. We use this value $v_i$ to define another stream $S_2$ as
follows. Let the stream $S_2$ be identical to stream $S_1$ except that
all the $v_i$ values are removed and replaced with $\lfloor
n/s\rfloor$ additional copies of the value $v_{i+1}$. The cumulative
distribution function of the streams $S_1$ and $S_2$ are illustrated
in Figure~\ref{fig:lb}.

Now, since the sketch for $S_1$ did not retain any information
regarding the value of $v_1$, it cannot distinguish between the
streams $S_1$ and $S_2$. As a result, when comparing $S_1$ against
itself (when the KS-statistic is zero) and the stream $S_2$ (with
KS-statistic of $1/s$, as shown in Figure~\ref{fig:lb}), it must make
an error of at least $\frac{1}{2s}$ in one of the cases. Hence, any sketch
that retains less than $s$ bits of information may incur an error as
large as $\Omega(1/s)$. Substituting $s = \sqrt{n}$, we can infer that
any sketch that uses less than $\sqrt{n}$ bits will have at least
$\Omega(1/\sqrt{n})$ error.
\end{proof}

The proof for the one-sample case is very similar---we simply have to
replace the discrete distributions constructed in the above proof with
continuous ones. Because it is so similar, we omit the proof here.

